Mining and Ranking Biomedical Synonym Candidates from Wikipedia

نویسندگان

  • Abhyuday Jagannatha
  • Jinying Chen
  • Hong Yu
چکیده

Biomedical synonyms are important resources for Natural Language Processing in Biomedical domain. Existing synonym resources (e.g., the UMLS) are not complete. Manual efforts for expanding and enriching these resources are prohibitively expensive. We therefore develop and evaluate approaches for automated synonym extraction from Wikipedia. Using the inter-wiki links, we extracted the candidate synonyms (anchor-text e.g., “increased thirst”) in a Wikipedia page and the title (e.g., “polyuria”) of its corresponding linked page. We rank synonym candidates with word embedding and pseudo-relevance feedback (PRF). Our results show that PRF-based reranking outperformed word embedding based approach and a strong baseline using interwiki link frequency. A hybrid method, Rank Score Combination, achieved the best results. Our analysis also suggests that medical synonyms mined from Wikipedia can increase the coverage of existing synonym resources such

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting BabelNet for Multilingual Biomedical Synonym Expansion

Our challenge contribution for CLEF-­‐ER consists in providing annotations for all three corpora of the challenge (Medline, EMEA, Patents) for the languages French and German. The objective of these experiments is to verify whether a general multilingual ontological resource as BabelNet (http://babelnet.org) can be used to substantially enrich the terminology provided by the challenge organizer...

متن کامل

Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task

Understanding intent underlying search query recently attracted enormous research interests. Two challenging issues are worth noting: First, words within query are usually ambiguous while query in most cases is too short to disambiguate. Second, ambiguity in some cases cannot be resolved according merely to the limited query context. It is thus demanded that the ambiguity be resolved/analyzed w...

متن کامل

HIT2 Joint NLP Lab at the NTCIR-9 Intent Task

The report hereby is to represent the principle, the searching process and experiment results. We report our systems and experiments in the intent task of NTCIR 9. The research aims at evaluating the effectiveness of the proposed methods on query intent mining and results diversification in terms of web search. In the subtopic mining subtask, we combine the extracted candidates from search logs...

متن کامل

Comparative Evaluation of Link-Based Approaches for Candidate Ranking in Link-to-Wikipedia Systems

In recent years, the task of automatically linking pieces of text (anchors) mentioned in a document to Wikipedia articles that represent the meaning of these anchors has received extensive research attention. Typically, link-to-Wikipedia systems try to find a set of Wikipedia articles that are candidates to represent the meaning of the anchor and, later, rank these candidates to select the most...

متن کامل

Ranking relations between diseases, drugs and genes for a curation task

BACKGROUND One of the key pieces of information which biomedical text mining systems are expected to extract from the literature are interactions among different types of biomedical entities (proteins, genes, diseases, drugs, etc.). Several large resources of curated relations between biomedical entities are currently available, such as the Pharmacogenomics Knowledge Base (PharmGKB) or the Comp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015